Phonetic Variation Analysis Via Multi-Factor Sparse Plus Low Rank Language Model

نویسندگان

Curtis Fielding

Joshua Weaver

Brian Hutchinson

چکیده

Phonetic transcriptions contain rich information about language. First, the sequential patterns in phonetic transcripts reveal information about the language’s phonotactics. When combined with lexical information, this can help to grow or correct pronunciation dictionaries and to improve grapheme-to-phoneme prediction. Second, the places where pronunciations deviate from the norm can be equally informative; for example, by providing cues for speaker traits such as accent, dialect or sociolect. Interesting in itself, detecting speaker characteristics can also be used to improve speech recognition system performance (Biadsy, 2011). In this extended abstract we describe on-going work to automatically analyze both the regularities and the exceptions (deviations) in phonetic sequences. We use the Multi-Factor Sparse Plus Low Rank Language Model (Hutchinson et al., 2013), which was shown to effectively model regularities and exceptions in word sequences (e.g. by identifying lexical deviations characteristic of topic or speaker role). Preliminary results modeling commonalities and variation between dialects of American English are promising and suggest several extensions to this work.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Noisy Matrix Decomposition via Convex Relaxation: Optimal Rates in High Dimensions1 by Alekh Agarwal2, Sahand Negahban3 And

We analyze a class of estimators based on convex relaxation for solving high-dimensional matrix decomposition problems. The observations are noisy realizations of a linear transformation X of the sum of an (approximately) low rank matrix with a second matrix endowed with a complementary form of low-dimensional structure; this set-up includes many statistical models of interest, including factor...

متن کامل

Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions

We analyze a class of estimators based on convex relaxation for solving high-dimensional matrix decomposition problems. The observations are noisy realizations of a linear transformation X of the sum of an (approximately) low rank matrix Θ⋆ with a second matrix Γ⋆ endowed with a complementary form of low-dimensional structure; this set-up includes many statistical models of interest, including ...

متن کامل

A Sparse Plus Low Rank Maximum Entropy Language Model

This work introduces a new maximum entropy language model that decomposes the model parameters into a low rank component that learns regularities in the training data and a sparse component that learns exceptions (e.g. multiword expressions). The low rank component corresponds to a continuous-space language model. This model generalizes the standard `1regularized maximum entropy model, and has ...

متن کامل

مقایسه روش های طیفی برای شناسایی زبان گفتاری

Identifying spoken language automatically is to identify a language from the speech signal. Language identification systems can be divided into two categories, spectral-based methods and phonetic-based methods. In the former, short-time characteristics of speech spectrum are extracted as a multi-dimensional vector. The statistical model of these features is then obtained for each language. The ...

متن کامل

Identifying Broad and Narrow Financial Risk Factors with Convex Optimization

Factor analysis of security returns aims to decompose a return covariance matrix into systematic and specific risk components. To date, most commercially successful factor analysis has been based on fundamental models, although there is a large academic literature on statistical models. While successful in many respects, traditional statistical approaches like principal component analysis and m...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Phonetic Variation Analysis Via Multi-Factor Sparse Plus Low Rank Language Model

نویسندگان

چکیده

منابع مشابه

Noisy Matrix Decomposition via Convex Relaxation: Optimal Rates in High Dimensions1 by Alekh Agarwal2, Sahand Negahban3 And

Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions

A Sparse Plus Low Rank Maximum Entropy Language Model

مقایسه روش های طیفی برای شناسایی زبان گفتاری

Identifying Broad and Narrow Financial Risk Factors with Convex Optimization

عنوان ژورنال:

اشتراک گذاری